42 research outputs found

    An auxiliary Part-of-Speech tagger for blog and microblog cyber-slang

    Get PDF
    The increasing impact of Web 2.0 involves a growing usage of slang, abbreviations, and emphasized words, which limit the performance of traditional natural language processing models. The state-of-the-art Part-of-Speech (POS) taggers are often unable to assign a meaningful POS tag to all the words in a Web 2.0 text. To solve this limitation, we are proposing an auxiliary POS tagger that assigns the POS tag to a given token based on the information deriving from a sequence of preceding and following POS tags. The main advantage of the proposed auxiliary POS tagger is its ability to overcome the need of tokens’ information since it only relies on the sequences of existing POS tags. This tagger is called auxiliary because it requires an initial POS tagging procedure that might be performed using online dictionaries (e.g.,Wikidictionary) or other POS tagging algorithms. The auxiliary POS tagger relies on a Bayesian network that uses information about preceding and following POS tags. It was evaluated on the Brown Corpus, which is a general linguistics corpus, on the modern ARK dataset composed by Twitter messages, and on a corpus of manually labeledWeb 2.0 data

    Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

    Get PDF
    This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia

    A Google trends spatial clustering approach for a worldwide Twitter user geolocation

    Get PDF
    User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.The work of P. Cortez was supported by FCT – Funda ̧c ̃ao para a Ciˆencia eTecnologia within the R&D Units Project Scope: UIDB/00319/2020. We wouldalso like to thank the anonymous reviewers for their helpful suggestions

    The Fruit Fly, Drosophila melanogaster: The Making of a Model (Part I)

    Get PDF
    The fruit fly, Drosophila melanogaster (Meigen, 1830) has been established as a cornerstone for research into a wide array of subjects including diseases, development, physiology, and genetics. Thanks to an abundance of genetic tools, publicly available fly stocks, and databases, as well as their considerable biological similarity to mammalian systems, Drosophila has been solidified as a key model organism for elucidating many aspects of human disease. Herein is presented an overview of what makes Drosophila such an appealing model organism. In Part I of this chapter, basic Drosophila biology is reviewed and the most relevant genetic tools available to Drosophila researchers are covered. Then in part II, we outline the use of Drosophila as a model organism to study a wide array of pathologies in which Drosophila has been used, along with key advances made in the specific field using the fly as a model organism

    The Fruit Fly, Drosophila melanogaster: Modeling of Human Diseases (Part II)

    Get PDF
    The fruit fly, Drosophila melanogaster (Meigen, 1830) has been established as a key model organism thanks in part to their considerable biological similarity to mammals and an abundance of available genetic tools. Drosophila have been used to model many human disease states and have been critical in elucidating the genetic mechanisms contributing to them. Part I of this chapter covered basic Drosophila biology and relevant genetic tools available to Drosophila researchers. Here in part II, we review the use of Drosophila as a model organism to study neurodegenerative disorders, cardiovascular diseases, kidney diseases, cancer, metabolic disorders, and immune disorders, as well as key findings made in those fields thanks to Drosophila research

    Twitter user geolocation using web country noun searches

    Get PDF
    Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the anonymous reviewers for their helpful suggestions

    Social media cross-source and cross-domain sentiment classification

    Get PDF
    Due to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian.Research carried out with the support of resources of Big&Open Data Innovation Laboratory (BODaI-Lab), the University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the three anonymous reviewers for their helpful suggestions

    Micro-Bypass Implantation for Primary Open-Angle Glaucoma Combined with Phacoemulsification: 4-Year Follow-Up

    Get PDF
    Purpose. To report the long-term follow-up results in patients with cataract and primary open-angle glaucoma (POAG) randomly assigned to cataract surgery combined with micro-bypass stent implantation or phacoemulsification alone. Methods. 36 subjects with cataract and POAG were randomized in a 1 : 2 ratio to either iStent implantation and cataract surgery (combined group) or cataract surgery alone (control group). 24 subjects agreed to be evaluated again 48 months after surgery. Patients returned one month later for unmedicated washout assessment. Results. At the long-term follow-up visit we reported a mean IOP of 15,9 ± 2,3 mmHg in the iStent group and 17 ± 2,5 mmHg in the control group (p=NS). After washout, a 14,2% between group difference in favour of the combined group was statistically significant (p=0,02) for mean IOP reduction. A significant reduction in the mean number of medications was observed in both groups compared to baseline values (p=0,005 in the combined group and p=0,01 in the control group). Conclusion. Patients in the combined group maintained low IOP levels after long-term follow-up. Cataract surgery alone showed a loss of efficacy in controlling IOP over time. Both treatments reduced the number of ocular hypotensive medications prescribed. This trial is registered with: NCT00847158

    Multifrequency Photo-polarimetric WEBT Observation Campaign on the Blazar S5 0716+714: Source Microvariability and Search for Characteristic Timescales

    Get PDF
    Here we report on the results of the WEBT photo-polarimetric campaign targeting the blazar S5~0716+71, organized in March 2014 to monitor the source simultaneously in BVRI and near IR filters. The campaign resulted in an unprecedented dataset spanning 110\sim 110\,h of nearly continuous, multi-band observations, including two sets of densely sampled polarimetric data mainly in R filter. During the campaign, the source displayed pronounced variability with peak-to-peak variations of about 30%30\% and "bluer-when-brighter" spectral evolution, consisting of a day-timescale modulation with superimposed hourlong microflares characterized by 0.1\sim 0.1\,mag flux changes. We performed an in-depth search for quasi-periodicities in the source light curve; hints for the presence of oscillations on timescales of 3\sim 3\,h and 5\sim 5\,h do not represent highly significant departures from a pure red-noise power spectrum. We observed that, at a certain configuration of the optical polarization angle relative to the positional angle of the innermost radio jet in the source, changes in the polarization degree led the total flux variability by about 2\,h; meanwhile, when the relative configuration of the polarization and jet angles altered, no such lag could be noted. The microflaring events, when analyzed as separate pulse emission components, were found to be characterized by a very high polarization degree (>30%> 30\%) and polarization angles which differed substantially from the polarization angle of the underlying background component, or from the radio jet positional angle. We discuss the results in the general context of blazar emission and energy dissipation models.Comment: 16 pages, 17 Figures; ApJ accepte
    corecore